Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems
نویسندگان
چکیده
A mobile computing system consists of mobile and stationary nodes, connected to each other by a communication network. The presence of mobile nodes in the system places constraints on the permissible energy consumption and available communication bandwidth. To minimize the lost computation during recovery from node failures, periodic collection of a consistent snapshot of the system (checkpoint) is required. Locating mobile nodes contributes to the checkpointing and recovery costs. Synchronous snapshot collection algorithms, designed for static networks, either force every node in the system to take a new local snapshot, or block the underlying computation during snapshot collection. Hence, they are not suitable for mobile computing systems. If nodes take their local checkpoints independently in an uncoordinated manner, each node may have to store multiple local checkpoints in stable storage. This is not suitable for mobile nodes as they have small memory. This paper presents a synchronous snapshot collection algorithm for mobile systems that neither forces every node to take a local snapshot, nor blocks the underlying computation during snapshot collection. If a node initiates snapshot collection, local snapshots of only those nodes that have directly or transitively a ected the initiator since their last snapshots need to be taken. We prove that the global snapshot collection terminates within a nite time of its invocation and the collected global snapshot is consistent. We also propose a minimal rollback/recovery algorithm in which the computation at a node is rolled back only if it depends on operations that have been undone due to the failure of node(s). Both the algorithms have low communication and storage overheads and meet the low energy consumption and low bandwidth constraints of mobile computing systems.
منابع مشابه
An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملFailure Recovery based on Quasi-Synchronous Checkpointing in Mobile Computing Systems
Mobile computing systems are expected to revolutionize the way computers are used. Mobile hosts have small memory, a relatively slow processor and low power batteries, and communicate over low bandwidth wireless communication links. In this paper, we address the problem of failure recovery in mobile computing systems. Any recovery method for mobile computing systems should take into considerati...
متن کاملAn Efficient Recovery Scheme for Mobile Computing Environments
This paper presents an efficient recovery scheme to provide fault-tolerance for mobile computing systems. The proposed scheme is based on message logging and independent checkpointing, since the checkpointing-only schemes are not suitable for the mobile environment in which unreliable mobile hosts and fragile network connection may hinder any kind of coordination for checkpointing and recovery....
متن کاملEfficient Checkpoint-based Failure Recovery Techniques in Mobile Computing Systems
Conventional distributed and domino effect-free failure recovery techniques are inappropriate for mobile computing systems because each mobile host is forced to take a new checkpoint (based on coordinated checkpointing). Otherwise, multiple local checkpoints may need to be stored in stable storage (based on communication-induced checkpointing). Hence, this investigation presents a novel domino ...
متن کاملA Survey and Performance Analysis of Checkpointing and Recovery Schemes for Mobile Computing Systems
A SURVEY AND PERFORMANCE ANALYSIS OF CHECKPOINTING AND RECOVERY SCHEMES FOR MOBILE COMPUTING SYSTEMS Ruchi Tuli1 and Parveen Kumar2 1Yanbu University College, Royal Commission for Jubail and Yanbu, Directorate General for Yanbu, P.O. Box 30436 Madinat Yanbu Al Sinaiyah Kingdom of Saudi Arabia., E-mail : [email protected] 2Merrut Institute of Engineering and Technology, Merrut (INDIA) E-mail ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Parallel Distrib. Syst.
دوره 7 شماره
صفحات -
تاریخ انتشار 1996